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Abstract. Brizolis asked the question: does every prime p have a pair 
(g,h) such that ft is a fixed point for the discrete logarithm with base 
gl This author and Pieter Moree, building on work of Zhang, Cobeli, 
and Zaharescu, gave heuristics for estimating the number of such pairs 
and proved bounds on the error in the estimates. These bounds are not 
descriptive of the true situation, however, and this paper is a first attempt 
to collect and analyze some data on the distribution of the actual error 
in the estimates. 



1 Introduction 

Paragraph F9 of |S] includes the following problem, attributed to Brizolis: given 
a prime p > 3, is there always a pair (g, h) such that g is a primitive root of p, 
1 < h < p — 1 , and 

g h = h mod p ? (1) 

In other words, is there always a primitive root g such that the discrete logarithm 
log g has a fixed point? This question has now been settled affirmatively by 
Campbell and Pomerance in [2] . The answer relies on an estimate for the number 
N(p) of pairs (g, h) which satisfy the equation, have g is primitive root, and also 
have h a primitive root which thus must be relatively prime to p — 1. This 
result seems to have been discovered and proved by Zhang in ^U] and later, 
independently, by Cobeli and Zaharescu in [3]. 

In |H] and 0, Pieter Moree and this author applied the same methods to 
estimate the number of solutions to Q given no conditions on g and h. Unfor- 
tunately, the error term involved in this estimate was completely unsatisfactory. 
It was also shown in that for a positive proportion of primes a better error 
estimate can be obtained, and it was conjectured that one could do even better. 
The object of this note is to collect and analyze some data on the distribution 
of the actual error in these estimates. 

The idea of repeatedly applying the function x i— > g x mod p is used in the 
famous cryptographically secure pseudorandom bit generator of Blum and Mi- 
cali. (pQ; see also [5] and among others, for further developments.) If one 
could predict that a pseudorandom generator was going to fall into a fixed point 
or cycle of small length, this would obviously be detrimental to cryptographic 
security. We hope that the investigation of the cycle structure of the discrete 



logarithm will therefore eventually be of some use to those interested in the field 
of cryptography. 

Using the same notation as in the previously cited papers, we will refer to 
an integer which is a primitive root modulo p as PR and an integer which is 
relatively prime to p — 1 as RP. An integer which is both will be referred to as 
RPPR and one which has no restrictions will be referred to as ANY. 

All integers will be taken to be between 1 and p — 1, inclusive, unless stated 
otherwise. If N(p) is, as above, the number of solutions to Q such that g is a 
primitive root and h is a primitive root which is relatively prime to p — 1, then 
we will say N(p) = -F^pr^rpprCp), and similarly for other conditions. We will 
use d(n) for the number of divisors of n and o~(n) for the sum of the divisors of 
n. All other notations should be fairly standard. 



2 Heuristics, Conjectures, and Previous Results 

The fundamental observation at the heart of the estimation of F g pR ; /t rppr(p) is 
that if h is a primitive root modulo p which is also relatively prime to p— 1, then 
there is a unique primitive root g satisfying Q , namely g — h reduced modulo 
p, where h denotes the inverse of h modulo p—1 throughout this note. Thus to 
estimate N(p), we only need to count the number of such h; g no longer has to 
be considered. We observe that there are (f>(p — 1) possibilities for h which are 
relatively prime to p — 1, and we would expect each of them to be a primitive 
root with probability 4>{p — l)/(p— 1). This heuristic uses the assumption that 
the condition of being a primitive root is in some sense "independent" of the 
condition of being relatively prime. 

We will actually need the following slightly more general heuristic: 

Heuristic 1 (Heuristic 2.6 of |8j). The order of x modulo p is independent 
of the greatest common divisor of x and p—1, in the sense that for all p, and 
all divisors e and f of p — 1. 

— ^-7#jz € {1, . . . ,p- 1}: gcd(x,p- 1) = e, ord p (a;) = ^y-j 

' #{.TG {!,..., p-1}: gcd{x,p- 1) = e} 



p-1 



x -—L-^jxe {!,..., p-1}: ord p (x) = ^y^j- 



The following lemma makes this heuristic rigorous; it was stated and proved 
in [S] using the ideas in 

Lemma 1 (Lemma 2.7 of |5]). Let e and f be divisors of p — 1, and N a 

multiple of p — 1. Let V — {1, . . . , N} and 

f p—1 
V' = < x e V: gcd{x,p— 1) = e, ord p (a;) = — - — 



Then 



N 



p-i 



p-i 



< d 



P~ 1\ j/'p- 1 



V^(l + lnp) 



<d(p-l) 2 VP(l + lnp)- 



Using this lemma with e = / — 1 it is straightforward to prove Cobeli and 
Zaharescu's version of Zhang's result. 



Theorem 2 (Theorem 1 of [3 ). 



F 



g PRJiRPPR 



(P) 



</>(p~l) 2 



P-I 



<d(p-l) 2 Vp(l+lnp). 



For the situation with no conditions on g and h, we see that can be solved 
exactly when gcd(/i,p — 1) = e and ft, is a e-th power modulo p, and in fact there 
are exactly e such solutions. Thus 



F g any, h any (p) = ^ eT ( e >P)- 



(2) 



e\p-l 



where 



rM = #{kP(U,p-l) (£) : gcd(ft,p-l)=e} 
Applying the lemma with e = / = 1 gives 
Proposition 1 (Proposition 4.2 of [8]). Lei e | p — 1. T/ien 

p-r 



(a) 



T (e,p)--0 

e 



< d 



y/p(l + lnp). 



f&j T(l,p) = ^(p-1). 



P-I 
2 : 
p-1 



P =0. 



(cj T(p-l,p)=T 
fdj 0<T(e,p) <0 



ANY,/i ANy(p) - (p — 3)| 

< d(p - 1) (*(p - 1) - 2(p - 1)) VP(1 + M- 

Unfortunately, the "error" term in Part Q will be larger than the main 
term for infinitely many p. Using the deep result of Fouvry (see, e.g., U) that 
3> xj In x primes p < x are such that p— 1 has a prime factor larger than p°- 6687 ; 
it was proved that: 



Theorem 3 (Theorem 4.8 of [8 ). There are > xj In x primes p < x such 
that 

Fg ANY.ft ANY (p) = (p - 1) + O (p 5 ^ . 

More specifically, there are 3> x/lax primes p < x such that 

\Fg ANY, ft ANY (p) — (p — 1) | < p° 8313 d(P - 1) 2 (2 + hip). 

It was also noted in jS] that if Fouvry's assertion holds true with 0.6687 
replaced by some larger 9 (up to 9 = 3/4), then in Theorem|3|the exponents 5/6 
and 0.8313 can be replaced by 3/2 — 9 + 8 and 3/2 — 9 for any 8 > 0. 

On the other hand, we also expect that for many primes the error term 
cannot be set too small. According to Heuristic ^ we can model T(e,p) using a 
set of independent random variables X%, . . . , X p _\ such that 

gcd(/i,p— 1) with probability 



1 otherwise. 

Then the heuristic suggests that F g any, /i any (p) is approximately equal to the 
expected value of X\ + • • • + X p ~i, which is clearly p — 1. On the other hand, 
the variance a 2 is the expected value of 

Note that the expected value of X^Xj is gcd(h,p — 1) if h = j and 1 otherwise. 
Using this, an easy computation shows that 

a 2 = Y^g C A(h lP -i)-{p-i)= d ^(^r)-(p-v- 

h=l d\p-l ^ ' 

In particular, the standard deviation a is less than p 1 / 2+€ for every e > (for 
sufficiently large p). Thus we have the following: 

Conjecture 1 (Conjecture 3.6 of Iffl). There are o{xj In x) primes p < x for which 

I -^GJ ,ff ANY,fc ANY (p) - 0—l)| >P V2+e 

for every e > 0. 



3 Data and Analysis 

Since a factor of the form p a dominates all of the proven and conjectured 
bounds on the error given above, we decided to collect data on the values of 
<5 = ^EJ.g any,/iANy(p) — (p — 3) for the first 1800 primes (3 through 15413). 
The data was then tallied based on the value of log \8\. Tableland Figure ^ 



Table 1. Values of 6 > for 3 < p < 15413 



log p \S\ 


0-1/6 


1/6-1/3 


1/3-1/2 


1/2-2/3 


2/3-5/6 


5/6-1 


total 


#ofp 


23 


69 


285 


353 


65 


1 


796 



Fig. 1. Plot of values of 8 > for 3 < p < 15413 
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Table 2. Values of S < for 3 < p < 15413 



log p \S\ 


0-1/6 


1/6-1/3 


1/3-1/2 


1/2-2/3 


2/3-5/6 


5/6-1 


total 


#ofp 


17 


78 


316 


542 


51 





1004 



give the data for S > 0, while Table [3 and Figure El give the data for <5 < 0. The 
case 6 = did not actually occur in this sample. Likewise, there were no cases 
where \delta\ > p, although this is certainly not ruled out for 6 > 0. 

It is not clear whether the greater number of negative values of S is significant, 
or a coincidence of this particular data set. The mean for Tabled is 0.4943 and 
the mean for Table|2]is 0.5050. This may reflect the same apparent bias towards 
negative values of S. 

Table El and Figure |21 give the values of |<5| for all computed values of S. 
The mean for this table is 0.5003, which suggests that the expected value of 
log p \S\ may in fact be 1/2, i.e., that the values of (5 may cluster around y/p. 



Fig. 2. Plot of values of 5 < for 3 < p < 15413 
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It is not immediately clear how to derive this from the heuristics. The sample 
standard deviation can be calculated to be 0.1374, but the data does not appear 
to be precisely normally distributed. This is confirmed by a chi-squared test for 
goodness of fit, which returns the extremely small p-value of 7.8039 • 10 -34 . 1 
A sample skewness of —0.6785 and a sample kurtosis of 3.6516 can also be 
computed. This reflects an asymmetric longer left tail (toward smaller values of 
log 1 8 1) and a somewhat sharper peak than a normal distribution. 



Table 3. All values of \5\ for 3 < p < 15413 



log P \6\ 


0-1/6 


1/6-1/3 


1/3-1/2 


1/2-2/3 


2/3-5/6 


5/6-1 


total 


#ofp 


40 


147 


601 


895 


116 


1 


1800 



1 The p-value here can be interpreted as the chance that a random sample taken from 
the predicted distribution would deviate from the distribution as a whole at least as 
much as the observed data did. Thus this set of data is an extremely bad match for 
the prediction. We are using statistical language in this note even though the data 
sets do not come from random variables, and are in fact deterministic. Thus, all of 
the statistical results in this note should be taken with a very large grain of salt. 



Fig. 3. Plot of all values of \6\ for 3 < p < 15413 
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A log-normal distribution was also investigated by taking the exponential 
function of the midpoints of each of the class intervals. This resulted in a mean 
of 1.6643, a sample standard deviation of 0.2196, a sample skewness of —0.2366 
and a sample kurtosis of 3.2065. Thus the shape of the distribution looks more 
like a normal distribution; however the chi-squared goodness of fit test still gives 
an extremely small p-value of 2.2243 • 10 -10 . Thus this still does not seem to be 
the correct distribution. More investigation is clearly necessary, both theoretical 
and statistical. 

The data sets from the tables were collected on a Beowulf cluster, using 
16 nodes, each consisting of 2 Pentium III processors running at 1 Ghz. The 
programming was done in C, using MPI, OpenMP, and OpenSSL libraries. The 
collection took approximately 60 hours for the 1800 primes between 3 and 15413 
(inclusive). 



4 Conclusion and Future Work 



This note is clearly a preliminary effort. The fact that we were unable to inter- 
pret the data as any sort of normal distribution is unsatisfying, if not perhaps 
surprising. We hope in the future to provide at least a conjectural explanation of 
this data. A better theoretical understanding of the error terms in the theorems 
we have cited would of course be helpful in this. 



The project of extending our analysis to three-cycles and more generally Re- 
cycles for small values of k, mentioned in previous papers, still remains to be 
done. Along similar lines, Igor Shparlinski has suggested attempting to analyze 
the average length of a cycle. Daniel Cloutier, a student at the Rose-Hulman 
Institute of Technology, has recently begun to collect data which we hope will 
shed light on both of these problems. 
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